Search CORE

39 research outputs found

Implicit feature detection for sentiment analysis

Author: Frasincar F. (Flavius)
Schouten K.I.M. (Kim)
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

Implicit feature detection is a promising research direction that has not seen much research yet. Based on previous work, where co-occurrences between notional words and ex- plicit features are used to find implicit features, this research critically reviews its underlying assumptions and proposes a revised algorithm, that directly uses the co-occurrences be- Tween implicit features and notional words. The revision is shown to perform better than the original method, but both methods are shown to fail in a more realistic scenario

EUR Research Repository

Erasmus University Digital Repository

Determining the most representative image on a Web page

Author: Frasincar F. (Flavius)
Vyas K. (Krishna)
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

We investigate how to determine the most representative image on a Web page. This problem has not been thoroughly investigated and, up to today, only expert-based algorithms have been proposed in the literature. We attempt to improve the performance of known algorithms with the use of Support Vector Machines (SVM). Besides, our algorithm distinguishes itself from existing literature with the introduction of novel image features, including previously unused meta-data protocols. Also, we design and attempt a less-restrictive ranking methodology in the image preprocessing stage of our algorithm. We find that the application of the SVM framework with our improved classification methodology increases the F1 score from 27.2% to 38.5%, as compared to a state-of-the-art method. Introducing novel image features and applying backward feature selection, we find that the F1 score rises to 40.0%. Lastly, we use a class-weighted SVM in order to resolve the imbalance in number of representative images. This final modification improves the classification performance of our algorithm even further to 43.9%, outperforming our benchmark algorithms, including those of Facebook and Google. Suggested beneficiaries are the search engine community, image retrieval community, including the commercial sector due to superior performance

Erasmus University Digital Repository

A Temporal Web Ontology Language

Author: Frasincar F. (Flavius)
Kaymak U. (Uzay)
Milea V. (Viorel)
Publication venue: Milea, V. (Viorel)
Publication date: 01/01/2009
Field of study

The Web Ontology Language (OWL) is the most expressive standard language for modeling ontologies on the Semantic Web. In this paper, we present a temporal extension of the very expressive fragment SHIN(D) of the OWL-DL language resulting in the tOWL language. Through a layered approach we introduce 3 extensions: i) Concrete Domains, that allows the representation of restrictions using concrete domain binary predicates, ii) Temporal Representation, that introduces timepoints, relations between timepoints, intervals, and Allen’s 13 interval relations into the language, and iii) TimeSlices/Fluents, that implements a perdurantist view on individuals and allows for the representation of complex temporal aspects, such as process state transitions. We illustrate the expressiveness of the newly introduced language by providing a TBox representation of Leveraged Buy Out (LBO) processes in financial applications and an ABox representation of one specific LBO

EUR Research Repository

Erasmus University Digital Repository

Automatically Building Financial Sentiment Lexicons While Accounting for Negation

Author: Bos T. (Thomas)
Frasincar F. (Flavius)
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/02/2021
Field of study

Financial investors make trades based on available information. Previous research has proved that microblogs are a useful source for supporting stock market decisions. However, the financial domain lacks specific sentiment lexicons that could be utilized to extract the sentiment from these microblogs. In this research, we investigate automatic approaches that can be used to build financial sentiment lexicons. We introduce weighted versions of the Pointwise Mutual Information approaches to build sentiment lexicons automatically. Furthermore, existing sentiment lexicons often neglect negation while building the sentiment lexicons. In this research, we also propose two methods (Negated Word and Flip Sentiment) to extend the sentiment building approaches to take into account negation when constructing a sentiment lexicon. We build the financial sentiment lexicons by leveraging 200,000 messages from StockTwits. We evaluate the constructed financial sentiment lexicons in two different sentiment classification tasks (unsupervised and supervised). In addition, the created financial sentiment lexicons are compared with each other and with other existing sentiment lexicons. The best performing financial sentiment lexicon is built by combining our Weighted Normalized Pointwise Mutual Information approach with the Negated Word appro

Erasmus University Digital Repository

A lexical approach for taxonomy mapping

Author: Frasincar F. (Flavius)
Nederstigt L.J. (Lennart)
Vandic D. (Damir)
Publication venue
Publication date: 01/03/2016
Field of study

Obtaining a useful complete overview of Web-based product information has become difficult nowadays due to the ever-growing amount of information available on online shops. Findings from previous studies suggest that better search capabilities, such as the exploitation of annotated data, are needed to keep online shopping transparent for the user. Annotations can, for example, help present information from multiple sources in a uniform manner. In order to support the product data integration process, we propose an algorithm that can autonomously map heterogeneous product taxonomies from different online shops. The proposed approach uses word sense disambiguation techniques, approximate lexical matching, and a mechanism that deals with composite categories. Our algorithm’s performance compared favorably against two other state-of-the-art taxonomy mapping algorithms on three real-life datasets. The results show that the F1-measure for our algorithm is on average 60% higher than a state-of-the-art product taxonomy mapping algorithm

Erasmus University Digital Repository

Using linguistic graph similarity to search for sentences in news articles

Author: Frasincar F. (Flavius)
Schouten K.I.M. (Kim)
Publication venue: 'IOS Press'
Publication date: 01/01/2016
Field of study

With the volume of daily news growing to sizes too big to handle for any individual human, there is a clear need for effective search algorithms. Since traditional bag-of-words approaches are inherently limited since they ignore much of the information that is embedded in the structure of the text, we propose a linguistic approach to search called Destiny in this paper. With Destiny, sentences, both from news items and the user queries, are represented as graphs where the nodes represent the words in the sentence and the edges represent the grammatical relations between the words. The proposed algorithm is evaluated against a TF-IDF baseline using a custom corpus of user-rated sentences. Destiny significantly outperforms TF-IDF in terms of Mean Average Precision, normalized Discounted Cumulative Gain, and Spearman's Rho

EUR Research Repository

Erasmus University Digital Repository

Ontology population from web product information

Author: Aanen S.S. (Steven)
Frasincar F. (Flavius)
Hogenboom F.P. (Frederik)
Nederstigt L.J. (Lennart)
Vandic D. (Damir)
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 07/04/2014
Field of study

With the vast amount of information available on the Web, there is an increasing need to structure Web data in order to make it accessible to both users and machines. E-commerce is one of the areas in which growing data congestion on the Web has serious consequences. This paper proposes a frame- work that is capable of populating a product ontology us- ing tabular product information from Web shops. By for- malizing product information in this way, better product comparison or recommendation applications could be built. Our approach employs both lexical and syntactic matching for mapping properties and instantiating values. The per- formed evaluation shows that instantiating consumer elec- Tronics from Best Buy and Newegg.com results in an F1 score of approximately 77%

EUR Research Repository

Erasmus University Digital Repository

News recommendation with CF-IDF+

Author: de Koning E. (Emma)
Frasincar F. (Flavius)
Hogenboom F.P. (Frederik)
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Traditionally, content-based recommendation is performed using term occurrences, which are leveraged in the TF-IDF method. This method is the defacto s

EUR Research Repository

Erasmus University Digital Repository

Review-aggregated aspect-based sentiment analysis with ontology features

Author: de Kok S. (Sophie)
Frasincar F. (Flavius)
Punt L. (Linda)
Ranta K. (Karoliina)
Schouten K.I.M. (Kim)
van den Puttelaar R. (Rosita)
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

With all the information that is available on the World Wide Web, there is great demand for data mining techniques and sentiment analysis is a particularly popular domain, both in business and research. Sentiment analysis aims to determine the sentiment value, often on a positive–negative scale, for a given product or service based on a set of textual reviews. As fine-grained information is more useful than just a single overall score, modern aspect-based sentiment analysis techniques break down the sentiment and assign sentiment scores to various aspects of the product or service mentioned in the review. In this work, we focus on aspect-based sentim

EUR Research Repository

Erasmus University Digital Repository

Sentiment analysis of multiple implicit features per sentence in consumer review data

Author: Den Ridder R. (Rick)
Dosoula N. (Nikoleta)
Frasincar F. (Flavius)
Griep R. (Roel)
Schouten K.I.M. (Kim)
Slangen R. (Rick)
Van Luijk R. (Ruud)
Publication venue: 'IOS Press'
Publication date: 01/01/2016
Field of study

With the rise of e-commerce, online consumer reviews have become crucial for consumers' purchasing decisions. Most of the existing research focuses on the detection of explicit features and sentiments in such reviews, thereby ignoring all that is reviewed implicitly. This study builds, in extension of an existing implicit feature algorithm that can only assign one implicit feature to each sentence, a classifier that predicts the presence of multiple implicit features in sentences. The classifier makes its prediction based on a custom score function and a trained threshold. Only if this score exceeds the threshold, we allow for the detection of multiple implicit feature. In this way, we increase the recall while limiting the decrease in precision. In the more realistic scenario, the classifier-based approach improves the F1-score from 62.9% to 64.5% on a restaurant review data set. The precision of the computed sentiment associated with the detected features is 63.9%

Erasmus University Digital Repository